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Summary 

Coarse  coded  memories  have  appeared  in  several  neural  network  symbol  processing  models,  such  as 
Touretzky  and  Hinton’s  distributed  connectionist  production  system  DCPS  |6,7],  Touretsky’s  distributed 
implementation  of  Lisp  S-expressions  on  a  Boltzmann  machine,  BoltiCONS  |8,9j,  and  St.  John  and 
McClelland’s  PDP  model  of  case  role  defaults  [4|.  In  order  to  determine  how  these  models  would  scale, 
one  must  first  have  some  understanding  of  the  mathematics  of  coarse  coded  representations.  For  example, 
the  working  memory  of  DCPS,  which  stores  triples  of  symbols,  consists  of  2,000  units,  can  hold  roughly 
20  items  at  a  time  out  of  a  lS,625<symbol  alphabet.  How  would  DCPS  scale  if  the  alphabet  sise  were 
raised  to  50,000?  With  the  current  alphabet  size,  how  many  units  would  have  to  be  added  simply  to 
double  the  working  memory  capacity  to  40  triples?  We  present  some  analytical  results  related  to  these 
questions. 

A  coarse  coded  symbol  memory  in  its  most  general  form  is  defined  by  two  parameters:  the  alphabet 
size  a  and  the  number  of  units,  N.  Each  unit  has  a  ‘receptive  field”  containing  some  subset  of  the 
alphabet.  Symbols  are  stored  in  memory  by  turning  on  all  the  units  in  whose  receptive  field  they  fall 
Thus,  symbols  are  represented  as  distributed  patterns  of  activity,  and  the  units  are  said  to  be  ‘coarsely 
tuned”  because  each  participates  in  the  representation  of  more  than  one  symbol.  However,  our  units' 
receptive  fields  are  not  restricted  to  contiguous  subregions  of  a  multidimensional  feature  space  as  are 
the  ‘value  units”  of  [1,2, 3, 5|.  They  are  instead  random  subsets  of  a  one>dimensional  symbol  space. 

A  symbol  is  deemed  present  if  all  its  receptors  are  active  (our  analysis  easily  generalizes  to  a  weaker 
criterion).  As  items  are  added,  ‘ghost”  symbols  eventually  appear,  these  are  symbols  which  were  not 
stored,  but  appear  because  aU  their  receptors  are  shared  with  symbols  that  were  stored.  The  capacity 
or  ‘span*  of  a  memory  is  the  number  of  symbols  k  that  can  be  stored  before  ghosts  appear.  (A  localist 
representation,  where  k  =  N,  a  very  inefficient  for  sparse  memories  with  a  large  alphabet.) 

In  this  analysis  we  assume  that  symbols  have  a  uniform  receptor  set  size  L,  and  that  each  of  the 
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a  symbols  is  assigned  a  random  £-subset  of  the  iV  units  making  up  the  memory.  The  probability  of  a 
ghost  appearing  after  k  symbols  have  been  stored  is  given  by  Equation  1; 


•^ghost(^> 


N 

1  ~  y^TNr.Z.(b,  e) 

c=0 


(1) 


Tf/,i,(k,c)  is  the  probability  that  exactly  e  units  will  be  active  after  k  symbols  have  been  stored.  It 
is  defined  by  Equation  2: 


Tn.L{k,c] 
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Ti^,l{0,c)  =  0  for  all  c. 

The  optimal  pattern  size  with  respect  to  N,  k,  and  a  can  be  determined  by  binary  search  on 
Equation  1.  However,  this  may  be  expensive  for  large  N.  A  good  initial  estimate  is  the  L  that  maximizes 
the  following  expression: 


(3) 


We  have  constructed  coarse  coded  memories  of  various  sizes  and  measured  their  capacities  experi* 
mentally.  The  results  show  good  agreement  with  the  predicted  values. 

We  present  graphs  of  the  relationships  between  N,  k,  a,  and  Pg^Qg^  for  optimum  pattern  sizes,  as 
determined  by  Equation  1.  A  representative  graph  is  shown  in  Figure  1.  The  results  show  an  exponential 
relationship  between  a  and  N/k.  Thus,  for  a  fixed  alphabet  size,  the  span  is  proportional  to  the  number 
of  units.  For  Pghojt  =  0.01  the  relationship  is; 

a  =  «|0.468f-4.761  (4) 


We  compare  the  capacity  obtained  using  our  probabilistic,  random  receptive  fields  approach  with  that 
of  two  other  approaches  which  guarantee  a  specified  span:  a  binary  coding  scheme,  and  an  approach 
where  the  overlap  between  any  two  patterns  is  bounded  by  [(L  —  l)/kj. 
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